Methods and Arrangements for Sub-Pel Motion-Adaptive Image Processing

ABSTRACT

Fractional-pel accurate motion is widely used in video processing and coding. For sub-band processing and coding, fractional-pel accuracy is challenging since it is difficult to process general motion fields with temporal transforms. In prior work, integer-pel accurate motion-adaptive transforms (MAT) have been designed. The present invention extends these to fractional-pel accuracy. The transforms are such that they permit multiple references and generate multiple low-band coefficients. Moreover, they permit to incorporate a general interpolation filter such that the high-band coefficients produced by the transform can be generated with interpolation filters that are commonly used for sub-pel accurate motion-compensated prediction.

RELATED PATENT DOCUMENTS

This application claims priority, under 35 U.S.C. § 19(e), of U.S. Patent Application Ser. No. 62/638,851, entitled “Methods and Arrangements for Sub-Pel Motion-Adaptive Image Processing,” and filed on Mar. 5, 2018, which is fully incorporated herein by reference.

FIELD

The present invention relates generally to processing and coding of image sequences, and more particularly to encoding and decoding images using motion compensation.

BACKGROUND

Motion-compensated prediction is widely used in image sequence processing and coding. Though it is burdened by several disadvantages like causal processing of images, challenging rate allocation problems due to dependent quantization, and limited scalability. In recent years, new methods for representing groups of successive images have been developed while considering the motion among the successive images. Such representations offer perfect reconstruction, allow for multiresolution analysis and synthesis, and aim at compacting the energy of the video signal into a small number of representing coefficients.

A new method for constructing an orthogonal representation for general motion fields has been introduced in U.S. Pat. No. 8,346,000. The transforms that generate these coefficients are defined by a sequence of incremental transforms, which are realized by so-called Euler rotations. Examples are uni-directional transforms, bi-directional transforms, and basic half-pel accurate motion-compensated transforms which are limited to simple averaging interpolation filters. The problem of general sub-pel accurate motion-compensated transforms that support general interpolation filters is not solved in U.S. Pat. No. 8,346,000.

The present invention is offering an efficient solution for general sub-pel accurate motion-compensated transforms that support general interpolation filters. First, general interpolation filters are introduced as a constraint in the high-dimensional transform. Second, the transform is obtained in two steps, namely energy compaction and energy redistribution.

SUMMARY

Fractional-pel accurate motion is widely used in video processing and coding. For sub-band processing and coding, fractional-pel accuracy is challenging since it is difficult to handle general motion fields with temporal transforms. In the prior invention U.S. Pat. No. 8,346,000, we designed integer-accurate motion-adaptive transforms (MAT) which can transform integer-accurate motion-connected coefficients. In the present invention, we extend the integer MAT to fractional-pel accuracy. The integer MAT allows only one reference coefficient to be the low-band coefficient. In the present invention, we design the transform such that it permits multiple references and generates multiple low-band coefficients. In addition, our fractional-pel MAT can incorporate a general interpolation filter into the basis vector, such that the high-band coefficients produced by the transform can be generated with interpolation filters that are commonly used for sub-pel accurate motion-compensated prediction. The fractional-pel MAT offers perfect reconstruction, orthogonality, and improved coding efficiency.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts an encoding and decoding system. The motion vectors (MV) and transform coefficients are encoded and transmitted.

FIG. 2 depicts a sequence of incremental transforms T₁, T₂, . . . , T_(K) that decorrelates x and outputs y.

FIG. 3A depicts the two orthogonal vectors t₁ and t₃ in a 3-dimensional space.

FIG. 3B depicts the mapping from f₁ to t₂ using Gram-Schmidt algorithm in (10) and (11).

FIG. 4 depicts the energy compaction and energy redistribution steps.

FIG. 5 depicts the distribution of energy E into x₁ and x₂ according to h₁ and h₂. (a) Energy equally distributed to x₁ and x₂. (b) Energy unequally distributed to x₁ and x₂.

FIG. 6 depicts the integer position A and the eight half-pel positions 1 to 8.

DETAILED DESCRIPTION

The present invention is directed to a compact representation of image sequences and related approaches, their uses and systems for the same.

Embodiment 1

The present invention describes orthogonal motion-adaptive transforms (MAT) which represent the image sequences in a compact representation, while allowing energy conservation and fractional-pel motion accuracy with arbitrary interpolation filters.

A specific implementation of the compact representation is to compact the energy of the pixels to fewer pixels. For an n-dimensional MAT, it generates n−1 energy-compacted lowband coefficient and one energy-removed highband coefficient. The interpolation filter is incorporated in MAT as one basis vector to generate the energy-removed coefficients.

Embodiment 2

This embodiment describes the energy compaction step of MAT. With this step, one energy-compacted lowband coefficient is generated. It also generates n−1 energy-removed highband coefficients, where last one of them is determined by the interpolation filter.

The first basis vector of energy compaction transform is determined by scale factors. The last basis vector is determined by an interpolation filter. The remaining basis vectors can be found by, e.g., Gram-Schmidt orthogonalization algorithm.

Scale factors are used to track the energy compaction under the assumption of ideal motion. Let x=[x₁, x₂, . . . , x_(n)]^(T) be a vector of coefficients connected by motion. Let the vector of scale factors associated with x be c=[c₁, c₂, . . . , c_(n)]^(T). Each x_(i), (i∈{1, 2, . . . , n}), can also be considered as a lowband coefficient. The scale factor c_(i) is used to represent the compacted energy in the coefficient as x_(i)=c_(i)x_(i′), where x_(i), is the original intensity value. Ideal motion assumes that x_(1′)=x_(2′)= . . . =x_(n′)=x′, i.e., these n pixels have the same original intensity x′. Then, the input coefficients can be expressed as

x=x′c.   (1)

In the following, a simple example with two coefficients and a Haar transform is used to illustrate the use of scale factors. Let x₁ and x₂ be the original intensity values, x₁=x₂=x′. If we compact the energy of x₁ and x₂ into one lowband coefficient y₁, i.e.,

$\begin{matrix} {{\begin{bmatrix} y_{1} \\ y_{2} \end{bmatrix} = {{{\frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1 \\ {- 1} & 1 \end{bmatrix}}\begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix}} = \begin{bmatrix} {\sqrt{2}x^{\prime}} \\ 0 \end{bmatrix}}},} & (2) \end{matrix}$

the output lowband coefficient y₁=√{square root over (2)}x′ becomes a scaled x′ with a factor √{square root over (2)}. The scale factor of y₁ is √{square root over (2)}, which is determined by the factor of x′ in (2). In general, y₁ is likely to be used further in hierarchical transforms. Thus, it is helpful to track the energy compaction of each lowband coefficient.

Similarly, if the energy of n pixels of x is compacted to one lowband coefficient, the corresponding scale factor is √{square root over (n)}. The scale factors are only determined by the motion information. They do not require extra information to be encoded.

Let T be an n×n transform matrix, and y=[y₁, y₂, . . . , y_(n)]^(T) the output. The transform gives y=T^(T)x. The transform compacts the energy into one lowband coefficient and produces n−1 highband coefficients.

With the assumption of ideal motion (1), the aim is to an orthonormal transform matrix T that perfectly compacts the energy of x into one lowband coefficient. Let t₁, t₂, . . . , t_(n) be the basis vectors of T, where t₁ represents the lowband vector and t_(n) the highest highband vector. The output coefficients are

y _(i) =x′t _(i) ^(T) c, for i=1, . . . , n.   (3)

The first coefficient y₁=x′t₁ ^(T)c is designed to capture the total energy of the signal x. Thus, t₁ needs to be collinear with c,

$\begin{matrix} {t_{1} = {\frac{c}{{c}_{2}}.}} & (4) \end{matrix}$

Then, y₁=x′√{square root over (c^(T)c)} contains the total energy of x, and no energy is left in other dimensions. Since t₁ represents one dimension in the n-dimensional space, and all the other basis vectors t₂, . . . , t_(n) are orthogonal to t₁, all highband coefficients y₂, . . . , y_(n) are zero. With this, the transform T is able to compact the energy perfectly. The constraint of t₁ in (4) is referred to as the subspace constraint.

If x deviates from ideal motion, i.e., x₁, x₂, . . . , x_(n) are affected by noise, it will not give perfect energy compaction into one coefficient. However, the subspace constraint t₁ is kept as it reflects ideal energy compaction for ideal motion.

Next, the highband vectors are constructed. The highband vectors need to be orthogonal to t₁ an are not unique. For fractional-pel accurate motion compensation, an interpolation filter is used over several reference pixel values to better approximate the current target pixel value. Hence, one solution is to design t_(o) based on the interpolation filter.

Consider the input x=[x₁, x₂, . . . , x_(n)]^(T), where the first n−1 coefficients x₁, x₂, . . . , x_(n−1) are the integer-sample references for the target x_(n). The first n−1 coefficients can be viewed as the reference pixel values in the reference frame which are used to generate an interpolation value. Let an interpolation filter be h=[h₁, h₂, . . . , h_(n−1)], where Σ_(i=1) ^(n−1) h_(i)=1. The interpolated value is {circumflex over (x)}_(n)=Σ_(i=1) ^(n−1) h_(i)x_(i), and the approximation error between the interpolated value and the target is x_(n)−{circumflex over (x)}_(n)=x_(n)−Σ_(i=1) ^(n−1) h_(i)x_(i).

When using an orthonormal transform, the energy of the highband-to-be x_(n) is expected to be removed as much as possible. In the transform, the last highband coefficient is given by the last basis vector. Thus, the interpolation filter is incorporated into the transform. A first approach is to form a basis vector as t_(n)=[−h, 1]^(T). This generates a highband coefficient

y _(n) =t _(n) ^(T) x=−Σ _(i=1) ^(n−1) h _(i) x _(i) +x _(n),   (5)

which is consistent with above defined approximation error.

The motion-adaptive transforms consider scale factors by design. Assuming ideal motion, the input signal is expressed as x=[c₁x′, c₂x′, . . . , c_(n)x′]^(T). To reuse this concept, we use the scale factors to adjust the coefficients of the interpolation filter. Then, the last basis vector t_(n) is

$\begin{matrix} {{t_{n} = \left\lbrack {{- \frac{h_{1}}{c_{1\;}}},{- \frac{h_{2}}{c_{2}}},\ldots \mspace{14mu},{- \frac{h_{n - 1}}{c_{n - 1}}},\frac{1}{c_{n}}} \right\rbrack^{T}},} & (6) \end{matrix}$

which can be normalized to

$t_{n} = {\frac{t_{n}}{{t_{n}}_{2}}.}$

For non-deal motion, the high-band coefficient y_(n) will reflect the approximation error.

Note that the basis vector t_(n) is orthogonal to t₁, as Σ_(i=1) ^(n−1) h_(i)=1.

For vertical or horizontal fractional-pel positions, the references are aligned in one dimension. The interpolation filter can be directly used to form t_(n). For non-vertical/horizontal fractional-pel positions, the references are distributed in two dimensions and t_(n) cannot be obtained directly. For example, to interpolate a diagonal HP position, HEVC first uses the 8-tap interpolation filer along the rows to generate eight horizontal HP values, and then, uses the 8-tap filter again along the columns to filter these eight horizontal HP values to generate the final interpolated value. Thus, to obtain t_(n), we need to consider the interpolation filters in both dimensions.

Let h_(h) be the p-tap interpolation filter horizontally, and h_(v) the q-tap filter vertically. Let H=h_(h) ^(T)h_(v) be the filter coefficient matrix of size p×q and X the matrix of references of the same size. The interpolated value is {circumflex over (x)}_(n)=Σ_(ij) H_(ij)X_(ij). Similar to the one-dimensional case, the highband coefficient is

y _(n) =x _(n)−{circumflex over (x)}_(n) =x _(n)−Σ_(ij) H _(ij) X _(ij) =t _(n) ^(T) x.   (7)

Reshaping H and X into vectors gives

t _(n) =[−H ₁₁ , −H ₁₂ , . . . , −H _(pq), 1]^(T),   (8)

x=[X ₁₁ , X ₁₂ , . . . , X _(pq) , x _(n)]^(T),   (9)

Again, normalize t_(n) to

$t_{n} = {\frac{t_{n}}{{t_{n}}_{2}}.}$

Since t_(n) is of dimension (pq+1)×1, this approach is not separable. When scale factors are used, an approach similar to (6) is necessary.

In an n-dimensional space, two basis vectors are determined by t₁ and t_(n). The remaining (n−2)-dimensional subspace is not unique for n>3. There are many ways to find a basis for the remaining subspace, e.g., decomposing the n-dimensional space using Gram-Schmidt or finding a certain matrix with its eigenvector matrix satisfying these constraints. Different approaches give different sets of t₂, . . . , t_(n−1). One example is to use an approach based on Gram-Schmidt decomposition.

Let an n-dimensional space be spanned by orthonormal vectors f₁, . . . , f_(n) (f_(j)∈R^(n) for j=1, . . . , n). We decompose this space for the given vectors t₁ and t_(n) using Gram-Schmidt orthonormalization. Let the projection of vector f_(j) onto the vector t_(i) be proj(f_(j), t_(i))=f_(j) ^(T)t_(i)·t_(i). For f_(j), we find a vector that is orthogonal to t₁, i.e., the orthogonal vector e_(j)=f_(j)−proj(f_(j),t₁). By subtracting the projections proj(f₁, t₁), . . . ,proj(f_(n),t₁), we reduce the n-dimensional space by one dimension. Since t_(n)⊥t₁, t_(n) is a vector in the (n−1)-dimensional subspace. Again, reduce the dimensionality by subtracting the projections of e₁, . . . , e_(n) onto t_(n). Then, we obtain an (n−2)-dimensional subspace. This subspace is orthogonal to both t₁ and t_(n). The remaining basis vectors can be easily found within this subspace by using Gram-Schmidt, i.e.,

$\begin{matrix} {{{\overset{\sim}{e}}_{j} = {f_{j - 1} - {\sum_{i \in {\{{1,\ldots \mspace{14mu},{j - 1},n}\}}}{{proj}\left( {f_{j - 1},t_{i}} \right)}}}},} & (10) \\ {{t_{j} = \frac{{\overset{\sim}{e}}_{j}}{{{\overset{\sim}{e}}_{j}}_{2\;}}},{{{for}\mspace{14mu} j} = 2},\ldots \mspace{14mu},{n - 1.}} & (11) \end{matrix}$

Equation (10) implies that {tilde over (e)}_(j) is obtained by subtracting all the projected parts of f_(j−1) onto t₁, . . . t_(j−1) and t_(n). The basis vectors t₁, . . . , t_(j−1) and t_(n) have been orthogonalized in the previous steps. {tilde over (e)}_(j) is guaranteed to be orthogonal to all the previous calculated basis vectors.

FIG. 3A depicts an example of two basis vectors t₁ and t₃ in a 3-dimensional space. t₁ and t₃ are orthogonal. FIG. 3B depicts the mapping from f₁ to t₂. The component {tilde over (e)}₂ is obtained by subtracting from f₁ the projections proj(f₁, t₁) and proj(f₁, t₃) . Then, t₂ is obtained by normalizing {tilde over (e)}₂.

The advantage of the Gram-Schmidt algorithm is that the algorithm does not modify the set of vectors if the input set of vectors is already optimal. That is, if the input vectors are orthogonal and decorrelate the signal (i.e., the KLT basis), the algorithm outputs the same set of vectors. Assume that f₁, . . . , f_(n) are the KLT basis vectors and that t₁=f₁ and t_(n)=f_(n). We need to find vectors that are orthogonal to f₁ and f_(n). Since the KLT basis vectors are orthogonal to each other, we always obtain proj(f_(j−i), t_(i))=0 in (10), and thus, t_(j)=f_(j). That is, the algorithm will not degrade the performance of an efficient initial orthogonal basis. In general, it is possible to choose an arbitrary set {f₁, . . . , f_(n)} for decomposition. Each will lead to a possible decomposition.

Embodiment 3

This embodiment describes the energy redistribution step of MAT. With this, the energy is redistributed from one coefficient to k (1≤k<n) coefficients.

The energy compaction process in Embodiment 2 compacts the energy into one coefficient and determines a highband coefficient. For fractional-pel MAT, there are two major challenges at this point. First, the transform in Embodiment 2 compacts the energy to only one coefficient. Since fractional-pel motion estimation refers to multiple references, the compacted energy need to be shifted to other references. Second, since there will be multiple lowband coefficients, the scale factors associated with these lowband coefficients need to be determined.

The main concept of creating multiple lowband coefficients includes two steps: First, compact the energy of the input signal to one coefficient, and second, redistribute the energy from one coefficient to multiple coefficients. The energy should be conserved. Thus, the transforms in the two steps need to be orthonormal.

Consider x as the input and y the output of the energy-compacting transform. Assume y_(l) to be the lowband coefficient. The energy of y_(l) is redistributed to k energy-redistributed coefficients {tilde over (x)}_(k)=[{tilde over (x)}₁, . . . , {tilde over (x)}_(k)]^(T). For fractional-pel accurate motion, k=n−1, i.e., the energy is redistributed to all the n−1 references. In general, 1≤k<n and k∈Z. Let U_(k) be the transform for energy redistribution. A k-dimensional orthonormal transform U_(k) is used to redistribute the energy to {tilde over (x)}_(k),

{tilde over (x)}_(k) =U _(k) ^(T) y _(k),   (12)

where y_(k) denotes the first k elements of y.

The energy compaction is given by y=T^(T)x. As T is orthonormal, the inverse process of energy compaction is then x=T^(T) ⁻¹ y=Ty. This inverse process can be viewed as redistributing the energy back to n coefficients. Using the same idea, if {tilde over (T)}_(k) is the transform that can compact k coefficients into one lowband coefficient according to y_(k)={tilde over (T)}_(k) ^(T){tilde over (x)}_(k), we simply let U_(k) ^(T)={tilde over (T)}_(k) to achieve energy redistribution,

{tilde over (x)}_(k)={tilde over (T)}_(k) y _(k),   (13)

To determine {tilde over (T)}_(k), the scale factors of {tilde over (x)}_(k) are needed. Let {tilde over (T)}_(k)=[{tilde over (t)}₁, . . . , {tilde over (t)}_(k)]. Similar to T, the lowband vector {tilde over (t)}₁ needs to satisfy the subspace constraint determined by the scale factors {tilde over (c)}_(k) of {tilde over (x)}_(k), i.e.,

${\overset{\sim}{t}}_{1} = {\frac{{\overset{\sim}{c}}_{k}}{{\overset{\sim}{c}}_{k}}.}$

Given {tilde over (c)}_(k), the matrix of {tilde over (T)}_(k) can be constructed using, e.g., Gram-Schmidt orthonormalization.

In conclusion, in the first step, T^(T) compacts the energy of the input to one energy-compacted coefficient. In the second step, {tilde over (T)}_(k) redistributes the compacted energy to k references for further processing. The final n-dimensional output is

$\begin{matrix} {\begin{bmatrix} {\overset{\sim}{x}}_{k} \\ y_{k + 1} \\ \vdots \\ y_{n} \end{bmatrix} = {\begin{bmatrix} {\overset{\sim}{T}}_{k} & 0_{k \times {({n - k})}} \\ 0_{{({n - k})} \times k} & I_{n - k} \end{bmatrix}\begin{bmatrix} y_{k} \\ y_{k + 1} \\ \vdots \\ y_{n} \end{bmatrix}}} & (14) \\ {\mspace{79mu} {{= {\begin{bmatrix} {\overset{\sim}{T}}_{k} & 0_{k \times {({n - k})}} \\ 0_{{({n - k})} \times k} & I_{n - k} \end{bmatrix}T^{T}x}},}} & (15) \end{matrix}$

where 0_(k×(n−k)) and 0_((n−k)×k) are zero matrices and I_(n−k) the identity matrix. In the fractional-pel case, {tilde over (T)}_(k) with k=n−1 can be viewed as rotation around the nth basis vector. Constructing {tilde over (T)}_(k) does not affect the highband vector t_(n).

The scale factors {tilde over (c)}_(k) are updated to track the energy of the lowbands. {tilde over (c)}_(k) is related to energy that is to be distributed to each lowband coefficient. One solution is to redistribute the lowband energy equally to the k coefficients, and thus, update the scale factors equally. Alternatively, since nearby references contribute more to the interpolated value according to the interpolation filter, it is reasonable to redistribute more energy to nearby references and less energy to faraway references.

A specific updating example is given below. Consider a simple 3-dimensional example with input x=[x₁, x₂, x₃]^(T), where x₁ and x₂ are the references of x₃. The energy of x₃ is distributed to the two references x₁ and x₂, which become {tilde over (x)}₁ and {tilde over (x)}₂, respectively. Let the interpolation filter coefficients associated with x₁ and x₂ be h₁ and h₂, respectively. The energy is expected to be equally distributed to these two coefficients if there is no particular preference for any of the two, i.e., h₁=h₂. As shown in FIG. 5(a), the quarter circle represents the energy E=x₃ ², and the 45° line divides E into two equal parts. Let s₁ and s₂ denote the square roots of the energy E distributed to x₁ and x₂, respectively. It has E=s₁ ²+s₂ ². Equal energy distribution gives |s₁|=|s₂|=√{square root over (2E)}/2.

FIG. 5(b) depicts the case where the energy is not equally distributed. To find s₁ and s₂, we consider the geometry property of |h₁| and |h₂|. Let the line (0,0)−(|h₁|, |h₂|) intersect E, and let coordinates of the intersection point be s₁ and s₂. It is obvious that E=s₁ ²+s₂ ², thus, the energy is preserved. Note that the filter coefficients can be negative, e.g., the 8-tap filter coefficients are [−1,4, −11,40,40, −11,4, −1]/64 in HEVC. However, energies do not have negative values. Thus, we use the absolute values of the filter coefficients. It follows that

$\begin{matrix} {\frac{s_{1}}{s_{2}} = {\frac{h_{1}}{h_{2}}.}} & (16) \end{matrix}$

A small

$\frac{h_{1}}{h_{2}}$

means that x₁ to the interpolation, thus, it is reasonable to distribute less energy to x₁, and vice versa. Then, s₁ and s₂ are

$\begin{matrix} {s_{1}^{2} = {{\frac{h_{1}^{2}}{h_{1}^{2} + h_{2}^{2}}E\mspace{14mu} {and}\mspace{14mu} s_{2}^{2}} = {\frac{h_{2}^{2}}{h_{1}^{2} + h_{2}^{2}}{E.}}}} & (17) \end{matrix}$

Now, consider the ideal motion assumption that the input is represented as x=[x₁, x₂, x₃]^(T)=[c₁, c₂, c₃]^(T)x′, where x′ is the original pixel value and c₁, c₂, c₃ are the scale factors. The energy of x₃ is E=x₃ ²=c₃ ²x′². From (17), we obtain that

$\begin{matrix} {s_{1}^{2} = {{\frac{h_{1}^{2}}{h_{1}^{2} + h_{2}^{2}}c_{3}^{2}x^{\prime \; 2}\mspace{14mu} {and}\mspace{14mu} s_{2}^{2}} = {\frac{h_{2}^{2}}{h_{1}^{2} + h_{2}^{2}}c_{3}^{2}{x^{\prime \; 2}.}}}} & (18) \end{matrix}$

The energies of {tilde over (x)}₁ and {tilde over (x)}₂ are updated to

$\begin{matrix} {{{\overset{\sim}{E}}_{1} = {{x_{1}^{2} + s_{1}^{2}} = {\left( {c_{1}^{2} + {\frac{h_{1}^{2}}{h_{1}^{2} + h_{2}^{2}}c_{3}^{2}}} \right)x^{\prime \; 2}}}},{{\overset{\sim}{E}}_{2} = {{x_{2}^{2} + s_{2}^{2}} = {\left( {c_{2}^{2} + {\frac{h_{2}^{2}}{h_{1}^{2} + h_{2}^{2}}c_{3}^{2}}} \right)x^{\prime \; 2}}}},} & (19) \end{matrix}$

respectively. Let {tilde over (c)}₁ and {tilde over (c)}₂ be the scale factors of {tilde over (x)}₁ and {tilde over (x)}₂, respectively. Since {tilde over (E)}₁={tilde over (c)}₁ ²x′² and {tilde over (E)}₂={tilde over (c)}₂ ²x′², the scale factors are updated as

$\begin{matrix} {{\overset{\sim}{c}}_{1} = {{\left( {c_{1}^{2} + {\frac{h_{1}^{2}}{h_{1}^{2} + h_{2}^{2}}c_{3}^{2}}} \right)^{\frac{1}{2}}\mspace{14mu} {and}\mspace{14mu} {\overset{\sim}{c}}_{2}} = {\left( {c_{2}^{2} + {\frac{h_{2}^{2}}{h_{1}^{2} + h_{2}^{2}}c_{3}^{2}}} \right)^{\frac{1}{2}}.}}} & (20) \end{matrix}$

Note that the scale factors are only determined by the final energy. The intermediate variable y_(k) discussed in (13) does not affect the update of scale factors.

In general, when the energy E is distributed to k references, E=Σ_(j=1) ^(k) s_(j) ² can be viewed as a hypersphere. Extending the line origin−(|h₁|, . . . , |h_(k)|) such that it intersects the hypersphere, we find the coordinates of the intersection point. Similar to (16), we have |s₁|:|s₂|: . . . :|s_(k)|=|h₁|:|h₂|: . . . :|h_(k)|, and we obtain that

$\begin{matrix} {{s_{i}^{2} = {\frac{h_{i}^{2}}{\sum\limits_{j = 1}^{k}\; h_{j}^{2}}E}},{{{for}\mspace{14mu} i} = 1},\ldots \;,{k.}} & (21) \end{matrix}$

Under ideal motion assumption, the energy can be expressed as E=c²x′², where c is the scale factor associated with E. The scale factor ĉ_(i) of the ith reference is updated according to

$\begin{matrix} {{{\overset{\sim}{c}}_{i} = \left( {c_{i}^{2} + {\frac{h_{i}^{2}}{\sum\limits_{j = 1}^{k}\; h_{j}^{2}}c^{2}}} \right)^{\frac{1}{2}}},{{{for}\mspace{14mu} i} = 1},\ldots \;,{k.}} & (22) \end{matrix}$

Here is a simple example to construct a half-pel MAT (HP-MAT) with two references. Let the input be x=[x₁, x₂, x₃]^(T), where x₁ and x₂ are the two references for x₃. Assume x₁=x₂=x₃=x are the original intensity values associated with scale factors of one. Since there are only two references, let the interpolation filter be h=[½, ½]. Then, the transform Tis a 3×3 matrix, and the basis vectors can be determined using (4) and (6), i.e.,

$\begin{matrix} {t_{1} = {{{\frac{1}{\sqrt{3}}\left\lbrack {1,1,1} \right\rbrack}^{T}\mspace{14mu} {and}\mspace{14mu} t_{3}} = {{\frac{1}{\sqrt{6}}\left\lbrack {{- 1},{- 1},2} \right\rbrack}^{T}.}}} & (23) \end{matrix}$

Then, decomposing a 3-dimensional space, the remaining vector is orthogonal to both t₁ and t_(n) as

$t_{2} = \frac{1}{\sqrt{2}}$

[1, −1, 0]^(T), Thus,

$\begin{matrix} {T = {\left\lbrack {t_{1}\mspace{14mu} t_{2}\mspace{14mu} t_{3}} \right\rbrack = {\begin{bmatrix} \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{2}} & {- \frac{1}{\sqrt{6}}} \\ \frac{1}{\sqrt{3}} & {- \frac{1}{\sqrt{2}}} & {- \frac{1}{\sqrt{6}}} \\ \frac{1}{\sqrt{3}} & 0 & \frac{2}{\sqrt{6}} \end{bmatrix}.}}} & (24) \end{matrix}$

With T, the energy of x is compacted to the first coefficient.

For energy redistribution, since there are only two references with equal filtering weights, the scale factors are updated according to (22) as

${\overset{\sim}{c}}_{1} = {{\sqrt{\frac{3}{2}}\mspace{14mu} {and}\mspace{14mu} {\overset{\sim}{c}}_{2}} = {\sqrt{\frac{3}{2}}.}}$

Then, the transform for redistribution is

$\begin{matrix} {{\overset{\sim}{T}}_{2} = {\begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & {- \frac{1}{\sqrt{2}}} \end{bmatrix}.}} & (25) \end{matrix}$

It can be easily verify that T^(T)T=I and {tilde over (T)}₂ ^(T{tilde over (T)}) ₂=1.

Thus, the MAT matrix according to (15) is

$\begin{matrix} {T_{MAT} = {{\begin{bmatrix} {\overset{\sim}{T}}_{2} & 0 \\ 0 & 1 \end{bmatrix}T^{T}} = {\begin{bmatrix} {\frac{1}{\sqrt{6}} + \frac{1}{2}} & {\frac{1}{\sqrt{6}} - \frac{1}{2}} & \frac{1}{\sqrt{6}} \\ {\frac{1}{\sqrt{6}} - \frac{1}{2}} & {\frac{1}{\sqrt{6}} + \frac{1}{2}} & \frac{1}{\sqrt{6}} \\ {- \frac{1}{\sqrt{6}}} & {- \frac{1}{\sqrt{6}}} & \frac{2}{\sqrt{6}} \end{bmatrix}.}}} & (26) \end{matrix}$

The last row of T_(MAT) is given by t₃, which is the highband vector determined by c and h as shown in (6). Applying T_(MAT) to x=[x, x, x]^(T), we obtain the final output

$\begin{matrix} {\begin{bmatrix} {\overset{\sim}{x}}_{2} \\ y_{3} \end{bmatrix} = {{T_{MAT}x} = {{x\begin{bmatrix} \sqrt{\frac{3}{2}} \\ \sqrt{\frac{3}{2}} \\ 0 \end{bmatrix}}.}}} & (27) \end{matrix}$

The energy of x is compacted to two lowband references and the highband turns to zero.

For HP accuracy, the half-pel MCOT (HP-MCOT) considers only two references, horizontally or vertically. HP-MCOT is a sequential Euler rotations that rotates the signal step by step. Assume a 3-dimensional signal with scale factors c₁, c₂, c₃, and the scale factors after update are {tilde over (c)}₁, {tilde over (c)}₂. We implement the energy compaction step as x₂→x₁, x₃→x₁, and, redistribution step as x₁→x₂. The transform matrix of the energy compaction step in HP-MCOT is

$\begin{matrix} \begin{matrix} {H_{1} = {\underset{\underset{{{compaction}\mspace{11mu} x_{3}}\rightarrow{{updated}\mspace{11mu} x_{1}}}{}}{\begin{bmatrix} \frac{c_{12}}{c_{123}} & 0 & \frac{c_{3}}{c_{123}} \\ 0 & 1 & 0 \\ {- \frac{c_{3}}{c_{123}}} & 0 & \frac{c_{12}}{c_{123}} \end{bmatrix}}\underset{\underset{{{compaction}\mspace{11mu} x_{2}}\rightarrow\; x_{1}}{}}{\begin{bmatrix} \frac{c_{1}}{c_{12}} & \frac{c_{2}}{c_{12}} & 0 \\ {- \frac{c_{2}}{c_{12}}} & \frac{c_{1}}{c_{12}} & 0 \\ 0 & 0 & 1 \end{bmatrix}}}} \\ {{= \begin{bmatrix} \frac{c_{1}}{c_{123}} & \frac{c_{2}}{c_{123}} & \frac{c_{3}}{c_{123}} \\ {- \frac{c_{2}}{c_{12}}} & \frac{c_{1}}{c_{12}} & 0 \\ {- \frac{c_{1}c_{3}}{{c_{12}}{c_{123}}}} & \frac{c_{2}c_{3}}{{c_{12}}{c_{123}}} & \frac{c_{12}}{c_{123}} \end{bmatrix}},} \end{matrix} & (28) \end{matrix}$

where ∥c₁₂∥=√{square root over (c₁ ²+c₂ ²)} and ∥c₁₂₃∥=√{square root over (c₁ ²+c₂ ²+c₃ ²)}. The energy redistribution step of HP-MCOT is the same as that of MAT, since it is a two dimensional fixed matrix where one basis vector [{tilde over (c)}₁, {tilde over (c)}₂]^(T) is given and the other vector is orthogonal to the given one.

It can be seen from (28) that if c₁=c₂=c₃, H₁ is the transpose of T in (24) up to sign differences. However, if the scale factors are not equal, the third row of H₁ will be different from t_(n) as discussed in (6). Then, HP-MCOT gives a different transform matrix than HP-MAT. In higher dimensions these two transforms are also different, since the HP-MAT has a highband vector determined by the interpolation filter, while HP-MCOT does not have such a vector.

Embodiment 4

This embodiment is a combination of Embodiment 2 and Embodiment 3.

FIG. 4 is a flow chart of corresponding operations that may be performed by MAT. The operations include obtaining an input of N coefficients 400, compacting energy to the first coefficient 402, and redistributing the energy to N−1 coefficients 404.

Embodiment 5

The application of Embodiments 1-4 is not limited to scalable video coding or temporal transforms. It can be applied to other areas where energy compaction is needed. One example is to apply in the spatial domain where hierarchical spatial transforms are needed. 

What is claimed is:
 1. A method for processing or coding a set of N images, where N is greater than one, where each pixel of the N images is associated with a scale factor, and where the images are linked by sub-pel accurate motion fields, where at least n−1 pixels of a first image are used to sub-pel motion-compensate at least one pixel of a second image, where n is greater than two, and where any of the n−1 pixels of a first image can be used more than once to motion-compensate other pixels in any of the N−1 images, the method comprising: using non-averaging, but general filter coefficients to scale n−1 pixels of a first image; using the scale factors of n−1 pixels of a first image and the scale factor of a pixel of a second image to consider any prior usage for motion compensation; and determining an n×n linear transform for the n−1 pixels of a first image and the linked pixel of a second image while considering n−1 filter coefficients and n scale factors.
 2. The method of claim 1, where the n×n linear transform is an orthogonal transform.
 3. The method of claim 1, where the n×n linear transform is constructed by Gram-Schmidt orthogonalization.
 4. The method of claim 1, where the n×n linear transform is accomplished in two steps: first, an n×n orthogonal transform is applied that compacts the energy of the n pixel values into one of the n−1 pixels of a first image; second, the energy of the n−1 pixels of a first image is redistributed among the n−1 pixels of a first image by using an (n−1)×(n−1) orthogonal transform. 